---
title: Rate Limiting Architecture
---

Rate limiting is a critical component of our infrastructure, ensuring accurate and robust rate limiting for our customers. This document provides an in-depth look at our rate limiting architecture, explaining each component and concept in detail.

## Overview

The Unkey API implements a distributed rate limiting system using a sliding window algorithm with Redis-backed coordination. This architecture provides high accuracy while maintaining low latency and horizontal scalability across multiple nodes and regions.

## Architectural Components

### Sliding Window Algorithm

The rate limiting system uses a sliding window algorithm that provides more accurate rate limiting than fixed window approaches:

1. **Time Windows**: Time is divided into fixed-duration windows (e.g., 1-minute intervals)
2. **Weighted Calculation**: For each request, the system calculates an effective count by combining:
   - 100% of the current window's count
   - A decreasing portion of the previous window's count based on how far we are into the current window
3. **Smooth Transitions**: This approach prevents the "cliff effect" at window boundaries, where all counters reset at once

### Redis as Centralized Counter Store

Redis serves as the authoritative source of truth for rate limit counters across all API nodes:

1. **Distributed Counters**: Redis maintains atomic counters for each rate limit window
2. **TTL Management**: Counters automatically expire after their relevant window passes
3. **Atomic Operations**: Redis' atomic operations ensure consistent counting across nodes

### Local In-Memory State

Each API node maintains local in-memory state for fast decision making:

1. **Buckets**: Track rate limit state for each unique identifier+limit+duration combination
2. **Windows**: Time-based counters that maintain accurate request counts
3. **Memory Management**: Background processes clean up expired windows to prevent memory leaks

## Request Flow

### 1. Initial Rate Limit Check

When a rate limit request arrives:

1. The node validates the request parameters (identifier, limit, duration)
2. It retrieves or creates local buckets and windows for the current and previous time periods
3. For new windows or after limit exceedance, it synchronizes state with Redis
4. It calculates the effective count using the sliding window algorithm
5. It makes a local decision (allow/deny) based on this calculation

### 2. Asynchronous Synchronization

After making a local decision:

1. The node buffers the request for asynchronous processing
2. Background workers send counter updates to Redis
3. This ensures eventual consistency without adding latency to API responses

### 3. Redis Synchronization

The system synchronizes with Redis in these scenarios:

1. When a new time window is encountered
2. After a rate limit has been exceeded (strict mode)
3. Periodically through background processes

## Implementation Details

### Bucket and Window Management

- **Buckets**: Maintain rate limit state for each identifier+limit+duration
- **Windows**: Aligned to time boundaries (e.g., minute start) for consistent behavior
- **Sequence Numbers**: Monotonically increasing numbers identify windows across nodes

### Resilience Features

- **Circuit Breaker**: Prevents cascading failures during Redis communication issues
- **Eventual Consistency**: System remains operational even during temporary Redis unavailability
- **Memory Management**: Automatic cleanup of expired windows prevents memory leaks

### Performance Optimizations

- **Local Decision Making**: Most decisions made locally avoid Redis round trips
- **Asynchronous Updates**: Background synchronization minimizes latency
- **Batched Operations**: Multiple counters retrieved in a single Redis operation
- **Multiple Replay Workers**: Parallel processing of synchronization tasks

## Deployment Considerations

### Redis Configuration

For production deployments:

1. Use Redis with appropriate persistence settings
2. Consider Redis Cluster for high availability
3. Position Redis instances close to API nodes to minimize latency

### Scaling

The system scales horizontally with minimal coordination overhead:

1. Add more API nodes to handle increased request volume
2. Scale Redis independently based on counter volume
3. Each API node operates independently with eventual consistency

This architecture strikes a balance between accuracy, performance, and operational simplicity, providing robust rate limiting across a distributed system without complex clustering mechanisms.